多视图学习是一个学习问题,它利用对象的各种表示来挖掘宝贵的知识并提高学习算法的性能,并且多视图学习的重要方向之一是子空间学习。正如我们所知,自动编码器是深度学习的方法,它可以通过重建输入来学习原始数据的潜在特征,并基于这一点,我们提出了一种名为基于自动编码器的共训练多视图学习的新算法(ACMVL)利用互补性和一致性,并找到多个视图的联合潜在特征表示。该算法有两个阶段,首先是培训每个视图的自动编码器,第二阶段是训练监督网络。有趣的是,两个阶段部分地分享权重,并通过共同培训过程互相帮助。根据实验结果,我们可以学习良好的潜在特征表示,并且每个视图的自动编码器具有比传统的自动编码器更强大的重建能力。
translated by 谷歌翻译
多视图学习通过LEVERAG-ING-ING-ING相同对象之间的关系来完成分类的任务目标。大多数现有方法通常关注多个视图之间的一致性和互补性。但并非所有这些信息都非常有用于分类任务。相反,它是扮演重要作用的具体辨别信息。钟张等。通过联合非负矩阵分组探讨不同视图中的共同视图中存在的判别和非歧视信息。在本文中,我们通过使用跨熵损耗函数来改善该算法来改善目标函数更好。最后,我们在相同数据集上的原始实施更好的分类效果,并在许多最先进的算法上显示其优越性。
translated by 谷歌翻译
多视图学习可以更全面地涵盖数据样本的所有功能,因此多视图学习引起了广泛的关注。传统的子空间聚类方法,如稀疏子空间群集(SSC)和低排名子空间群集(LRSC),为单个视图簇聚集亲和矩阵,从而忽略视图之间的融合问题。在我们的文章中,我们提出了一种基于注意力和AutoEncoder(MSALAA)的新的多视图子空间自适应学习。该方法组合了深度自动统计器和用于对齐各种视图的自我表示的方法,以在多视图低级稀疏子空间聚类(MLRSSC)中,这不仅可以将能力提高到非线性拟合,而且也可以满足多视图学习的一致性与互补原则。我们经验遵守六个现实生活数据集的现有基线方法的重大改进。
translated by 谷歌翻译
多视图学习尝试通过利用多视图数据之间的共识和/或互补性来生成具有更好性能的模型。然而,就互补性而言,大多数现有方法只能找到单一互补性而不是多样性的互补信息。在本文中,为了同时利用互补性和一致性,对多视图代表学习的互相促进互补性的深度学习的潜力,提出了一种新的监督多视图表示学习算法,称为自我关注具有多样性促进互补性的多视图网络(SAMVDPC)通过一组编码器利用一致性,使用自我关注查找需要多样性的互补信息。在八个现实世界数据集上进行的广泛实验已经证明了我们所提出的方法的有效性,并在几种基线方法上显示出优于的优势,只考虑单个互补信息。
translated by 谷歌翻译
在本文中,我们提出了一种新颖的细节多视图深度子空间网(AMVDSN),其深入探讨了多个视图中的一致性和特定信息,并通过考虑每个视图通过注意机制获得的动态贡献来熔化它们。与大多数多视图子空间学习方法不同,它们直接重建原始数据的数据点,或者在深层或浅层空间中学习表示时仅考虑一致性或互补性,我们提出的方法旨在查找明确认为共识和观点的联合潜在表示 - 多个视图之间的特定信息,然后对学习的联合潜在表示执行子空间群集。基础,不同的视图与表示学习有不同的贡献,我们引入了关注机制来导出每个视图的动态权重,这比以前的融合方法更好多视图子空间群集的领域。所提出的算法是直观的,并且由于神经网络框架,通过使用随机梯度下降(SGD)可以容易地优化,其与传统的子空间聚类方法相比,这也提供了强大的非线性表征能力。七个现实世界数据集的实验结果表明了我们提出的算法对某些最先进的子空间学习方法的有效性。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译